SemanticScuttle - klotz.me » Tags: topic: programming tools and libraries

Tags: topic: programming tools and libraries*

0 bookmark(s) - Sort by: Date ↓ / Title /

Using Claude Code: The Unreasonable Effectiveness of HTML.

Simon Willison discusses why requesting HTML rather than Markdown as an LLM output format can significantly enhance technical explanations. While token constraints previously favored Markdown, modern models benefit from the ability of HTML to incorporate SVG diagrams, interactive widgets, and improved navigation. The article provides prompt examples for reviewing pull requests via HTML artifacts and showcases a GPT-5.5 generated explanation of a Linux security exploit that uses CSS and JavaScript to create a rich documentation experience.

2026-05-09 Tags: html, security, markdown, llm, prompt engineering, claud -code, simon willison by klotz

Implementing Permission Gated Tool Calling in Python Agents

This article provides a technical guide on implementing permission gating for AI agents using Python to mitigate the risks of autonomous tool execution. It describes how to create an interception layer that requires explicit human authorization before any sensitive or high-impact tools are called, ensuring safer agentic workflows.

2026-05-09 Tags: python, agents, tool calling, permissions, agentic security, human in the loop by klotz

The improved Mintlify CLI

The Mintlify CLI has evolved from a simple local preview tool into a powerful terminal interface for managing documentation workflows. With the introduction of mint analytics, developers can now access page views, search queries, and user feedback directly through the command line, enabling seamless integration with coding agents like Claude Code to automate content updates and identify gaps. The update also enables search and AI assistant functionality within local previews and introduces new authentication commands for better session management.
Main topics:
- mint analytics for structured documentation data
- agent-driven development using CLI output
- search and AI assistant support in local dev environments
- improved identity management via mint login/logout

2026-05-08 Tags: mintlify, cli, documentation, mint analytics, coding agents, claude code, developer tools, ai assistant by klotz

gitcrawl

gitcrawl is a local-first GitHub triage tool and a drop-in caching shim for the gh CLI. It mirrors repository issues and pull requests into a local SQLite database, enabling semantic clustering and full-text search while preventing API rate limit exhaustion. This setup allows maintainers and AI agents to perform heavy read operations against a local cache rather than live GitHub servers.
Main features:
Local SQLite storage for all issue, PR, and commit metadata.
A gh-compatible shim that handles most read-only calls locally.
Semantic clustering using OpenAI embeddings to group related reports.
An interactive terminal UI for cluster browsing.
JSON support for easy automation with AI agents.

2026-05-06 Tags: github, triage, cli, sqlite, caching, semantic clustering, automation, llm, agents by klotz

Claude Code, Copilot and Codex all got hacked. Every attacker went for the credential, not the model.

# Incident Post-Mortem: Multi-Agent Credential Exfiltration Wave
**Date:** April 30, 2026
**Severity:** Critical (P1)
**Status:** Resolved / Patched
**Impacted Systems:** OpenAI Codex, Anthropic Claude Code, GitHub Copilot, Google Vertex AI

---

## 1. Executive Summary
Over a nine-month period leading up to April 2026, multiple research teams identified critical vulnerabilities across the industry's leading AI coding agents. Contrary to previous assumptions regarding "model hallucinations," these attacks did not target model logic; instead, they targeted **runtime credentials**. Attackers exploited the gap between the user interface and the underlying identity/authorization plane, allowing for unauthorized shell execution, sandbox escapes, and full repository takeovers via hijacked OAuth tokens and excessive service permissions.

## 2. Incident Overview
| Feature | Description |
| :--- | :--- |
| **Primary Attack Vector** | Credential theft and privilege escalation through agentic runtime environments. |
| **Core Vulnerability Class** | Broken Access Control; Improper Input Sanitization (Command Injection); Excessive Scoping. |
| **Detection Gap** | AI agents are currently invisible to standard IAM, CMDB, and asset inventory tools. |

## 3. Root Cause Analysis (RCA)

### A. Codex: Command Injection via Parameter Obfuscation
* **Mechanism:** Maliciously crafted GitHub branch names containing semicolon/backtick subshells were passed unsanitized into setup scripts during cloning.
* **Stealth Tactic:** Attackers used Unicode U+3000 (Ideographic Space) to make malicious branches appear identical to "main" in web portals, hiding the exfiltration payload from human reviewers.

### B. Claude Code: Sandbox & Logic Bypass
* **CVE-2026-25723:** Escaped project sandbox via unvalidated command chaining (piped `sed`/`echo`).
* **CVE-2026-33068:** Permission modes were resolved from `.claude/settings.json` *before* the workspace trust dialog appeared, allowing repos to auto-disable security prompts.
* **Performance Trade-off:** A logic flaw caused the agent to stop enforcing "deny rules" once a command chain exceeded 50 subcommands to optimize for speed.

### C. GitHub Copilot: Prompt Injection in Metadata
* **Mechanism:** Instructions hidden within Pull Request descriptions or GitHub Issues triggered Remote Code Execution (RCE) or forced the agent into an unrestricted "auto-approve" mode via `.vscode/settings.json` manipulation.

### D. Vertex AI: Excessive Default Scoping
* **Mechanism:** The default service identity (P4SA) possessed overly broad OAuth scopes, granting agents access to sensitive Google services (Gmail, Drive) and internal Artifact Registries by design rather than exception.

## 4. Lessons Learned
1. **Interface $neq$ System Security:** Enterprises have been approving AI *interfaces* without auditing the underlying *identities* those interfaces wield.
2. **Agent-Runtime vs. Code-Output:** Current security focus is on scanning the code an AI *writes*; however, the real threat vector is the environment in which the agent *executes*.
3. **The Speed/Security Paradox:** Developers and vendors are trading rigorous authorization checks for lower latency, creating a window of opportunity for attackers to reverse-engineer patches within 72 hours.

## 5. Corrective Action Plan (CAP)

### Immediate Technical Remediation
* » **Patch Deployment:** Ensure Claude Code is $ge$ v2.1.90; verify Copilot August 2025 patches.
* » **Scope Reduction:** Transition Vertex AI to a "Bring Your Own Service Account" (BYOSA) model to enforce least privilege.

### Long-term Governance & Prevention
* **Identity Inventory:** Integrate AI agent identities into CIEM (Cloud Infrastructure Entitlement Management) and CMDB systems.
* **Zero Trust Input Policy:** Treat all repository metadata (branch names, PR descriptions, READMEs) as untrusted input for agentic execution.
* **Non-Human PAM:** Implement Privileged Access Management (PAM) for AI agents, treating them with the same rigor as human privileged users (rotation, scoping, and session anchoring).
* **Vendor Audits:** Mandate written documentation from vendors regarding identity lifecycle management and credential rotation policies during renewal cycles.

2026-04-30 Tags: louis columbus, cve, copilot, claude, github, agent, cybersecurity, pomo, knownissue.ai by klotz

designlang — reads a website the way a developer reads a stylesheet

Designlang is a powerful tool designed to extract complete design systems from any live URL using a headless browser. It goes beyond simple color picking by analyzing layout architectures, responsive behaviors across breakpoints, interaction states like hover and focus, and motion languages such as easing and spring physics. The tool generates over 17 different file types including W3C DTCG tokens, Tailwind configurations, shadcn themes, Figma variables, and typed React component stubs to bridge the gap between live websites and development environments.
Key features include:
- Automated extraction of design tokens (primitive, semantic, and composite layers)
- Responsive analysis across multiple viewports
- Interaction state capture for hover, focus, and active transitions
- WCAG accessibility scoring with color remediation suggestions
- Multi-platform support for iOS, Android, Flutter, and WordPress
- Integration as an MCP server for AI coding agents like Cursor and Claude Code
- Design drift detection and visual diffing capabilities

2026-04-29 Tags: css, chrome-extension, cli, vlm accessibility, web-scraping, cursor, design-tokens, figma, design-system, npx, tailwind, design-to-code, playwright, dtcg, shadcn-ui, mcp-server, claude-code-plugin, agent-skill by klotz

Grafana Rearchitects Loki with Kafka and Ships a CLI to Bring Observability into Coding Agent

At GrafanaCON 2026, Grafana Labs announced significant updates including the launch of Grafana 13 and a major architectural overhaul for Loki. The new Loki design moves away from replication-at-ingestion toward using Kafka as a durability layer to reduce data duplication and improve query performance. Additionally, the company introduced GCX, a new CLI tool in public preview designed to integrate observability data directly into agentic development environments like Claude Code and Cursor, allowing engineers to resolve production issues without leaving their coding tools.
:
- Loki rearchitected with Kafka to reduce storage overhead and improve query speed.
- Introduction of GCX CLI for seamless observability integration within AI coding agents.
- Launch of Grafana 13 featuring dynamic dashboards and expanded data source support.
- New AI Observability product in public preview for monitoring LLM applications.

2026-04-29 Tags: devops, observability, grafana, loki, apache kafka, llm, cli, observability bus, logging, production engineering by klotz

Building AI Agents with Local Small Language Models

This article explores the growing trend of using small language models (SLMs) to power autonomous AI agents locally on consumer hardware. It discusses how recent advancements in model efficiency allow these smaller, specialized models to perform complex reasoning and tool-use tasks previously reserved for much larger models. The guide covers the benefits of local deployment, such as privacy, reduced latency, and cost savings, while outlining technical strategies for implementing agentic workflows using frameworks like LangChain or AutoGPT with quantized SLMs.

2026-04-29 Tags: aagents, small language models, local llm, machine learning, quantization, llm, python by klotz

Text Summarization with scikit-llm

This article demonstrates how to perform text summarization using the scikit-llm library, which provides a simple interface for utilizing large language models within a scikit-learn style workflow. The guide walks through installing the necessary dependencies and implementing both extractive and abstractive summarization techniques on sample text data.
Key topics include:
- Introduction to the scikit-llm library
- Implementing abstractive summarization using LLMs
- Using scikit-llm for text classification and clustering tasks
- Practical code examples for integrating LLM capabilities into machine learning pipelines

2026-04-28 Tags: text summarization, scikit-llm, llm, nlp, python, machine learning by klotz

OpenKB — Open LLM Knowledge Base

OpenKB is an open-source command-line system designed to transform raw documents into a structured, interlinked wiki-style knowledge base using Large Language Models. Unlike traditional RAG systems that rediscover information with every query, OpenKB compiles knowledge once into a persistent format where summaries, concept pages, and cross-references are automatically maintained and updated.
Key features and capabilities include:
- Vectorless long document retrieval powered by PageIndex tree indexing.
- Native multi-modality for understanding figures, tables, and images.
- Broad format support including PDF, Word, Markdown, PowerPoint, HTML, and Excel.
- Automated wiki compilation that creates summaries and synthesizes concepts across documents.
- Interactive chat sessions with persisted history and Obsidian compatibility via wikilinks.
- Health check tools (linting) to identify contradictions, gaps, or stale content within the knowledge base.

2026-04-27 Tags: llm, retrieval, knowledge base, agents, rag, open source, pageindex, github, vectifyai, openkb by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: topic: programming tools and libraries*

Linked Tags